9.4 Likelihood

113

correctness of a model. Since that appears to introduce a wildly fluctuating subjectiv-

ity into the calculations, it seems more reasonable to regard that as a fatal weakness

of the method. 18

To reiterate: our purpose is to find what the most likely explanation of a set of

observations is, that is, a description that is simpler, hence shorter, than the set of

facts observed to have occurred. 19

The three pillars of statistical inference are as follows:

1. A statistical model: that part of the description that is not (at least at present) in

question (corresponding to upper KK in Eq. 6.13).

2. The data: that which has been observed or measured (unconditional information);

3. The statistical hypothesis: the attribution of particular values to the unknown

parameters of the model that are under investigation (conditional information).

The preferred values of those parameters are then those that maximize the likelihood

of the model, likelihood being defined in the following:

Definition. The likelihoodupper L left parenthesis upper H vertical bar upper R right parenthesisL(H|R) of the hypothesis upper HH given dataupper RR and a specific

model is proportional toupper P left parenthesis upper R vertical bar upper H right parenthesisP(R|H), the constant of proportionality being arbitrary but

constant in any one application (i.e., with the same model and the same data, but

different hypotheses).

The arbitrariness of the constant of proportion is of no concern since, in practice,

likelihood ratios are taken, as in the following.

Definition. The likelihood ratio of two hypotheses on some data is the ratio of their

likelihoods on that data. It will be denoted as upper L left parenthesis upper H 1 comma upper H 2 vertical bar upper R right parenthesisL(H1, H2|R). The likelihood ratio of

two hypotheses on independent sets of data may be multiplied together to form the

likelihood ratio on the combined data:

upper L left parenthesis upper H 1 comma upper H 2 vertical bar upper R 1 ampersand upper R 2 right parenthesis equals upper L left parenthesis upper H 1 comma upper H 2 vertical bar upper R 1 right parenthesis times upper L left parenthesis upper H 1 comma upper H 2 vertical bar upper R 2 right parenthesis periodL(H1, H2|R1&R2) = L(H1, H2|R1) × L(H1, H2|R2) .

(9.50)

The fundamental difference between probability and likelihood is that in the inverse

probability approach upper RR is variable and upper HH constant, whereas in likelihood, upper HH is

variable and upper RR constant. In other words, likelihood is predicated on a fixed upper RR.

We shall sometimes need to recall that if upper R 1R1 and upper R 2R2 are two possible, mutually

exclusive, results and upper P left brace upper R vertical bar upper H right braceP{R|H} is the probability of obtaining the result upper RR given upper HH,

then

upper P left brace upper R 1 or upper R 2 vertical bar upper H right brace equals upper P left brace upper R 1 vertical bar upper H right brace plus upper P left brace upper R 2 vertical bar upper H right braceP{R1 or R2|H} = P{R1|H} + P{R2|H}

(9.51)

18 As Fisher and others have pointed out, it is not strictly correct to associate Bayes with the inverse

probability method. Bayes’ doubts as to its validity led him to withhold publication of his work (it

was published posthumously).

19 Sometimes brevity is taken as the main criterion. This is the minimum description length (MDL)

approach. See also the discussion in Sects. 7.4 and 11.5.